Summary
In this notebook we will open a netCDF4 file from the Earth Surface Minteral Dust Source Investigation (EMIT) as an xarray.Dataset. We will then extract the spectra at point coordinates from a .csv as a dataframe, then save and plot the data.
Requirements:
emit_tutorials environment as the kernel for this notebook.../data/ folder.Learning Objectives
xarray.Dataset.csv fileImport the required Python libraries.
# Import Packages
import sys
import numpy as np
import pandas as pd
import xarray as xr
import hvplot.pandas
import holoviews as hv
sys.path.append('../modules/')
from emit_tools import emit_xarray
Next, set the path to an EMIT dataset as an object. In this example we use an L2A Reflectance file.
fp = '../data/EMIT_L2A_RFL_001_20220903T163129_2224611_012.nc'
Open the file downloaded and defined as fp. To do this, we will use the emit_tools module which contains a few helpful functions that can be used with EMIT data.
ds = emit_xarray(fp)
ds
<xarray.Dataset>
Dimensions: (latitude: 2009, longitude: 2353, bands: 285)
Coordinates:
* latitude (latitude) float64 -39.31 -39.31 -39.31 ... -40.39 -40.4 -40.4
* longitude (longitude) float64 -62.51 -62.51 -62.51 ... -61.24 -61.24
wavelengths (bands) float32 381.0 388.4 395.8 ... 2.486e+03 2.493e+03
fwhm (bands) float32 8.415 8.415 8.415 8.415 ... 8.806 8.807 8.809
spatial_ref int32 0
Dimensions without coordinates: bands
Data variables:
reflectance (latitude, longitude, bands) float32 nan nan nan ... nan nan
Attributes: (12/38)
ncei_template_version: NCEI_NetCDF_Swath_Template_v2.0
summary: The Earth Surface Mineral Dust Source ...
keywords: Imaging Spectroscopy, minerals, EMIT, ...
Conventions: CF-1.63
sensor: EMIT (Earth Surface Mineral Dust Sourc...
instrument: EMIT
... ...
southernmost_latitude: -40.39610428069674
spatialResolution: 0.000542232520256367
spatial_ref: GEOGCS["WGS 84",DATUM["WGS_1984",SPHER...
geotransform: [-6.25120945e+01 5.42232520e-04 -0.00...
day_night_flag: Day
title: EMIT L2A Surface Reflectance 60 m V001Now open the .csv included in the /data/ directory as a pandas.dataframe.
Note: The category values here are arbitrary and included as an example of an additional column users may want.
points = pd.read_csv('../data/sample_coords.csv')
points
| ID | Category | Latitude | Longitude | |
|---|---|---|---|---|
| 0 | 0 | 1 | -39.94 | -62.36 |
| 1 | 1 | 1 | -39.75 | -61.74 |
| 2 | 2 | 3 | -40.00 | -62.10 |
| 3 | 3 | 2 | -39.89 | -61.85 |
| 4 | 4 | 3 | -39.38 | -62.03 |
Set the points dataframe index as ID to utilize our existing point ID's as an index.
points = points.set_index(['ID'])
Convert the dataframe to an xarray.Dataset
xp = points.to_xarray()
xp
<xarray.Dataset>
Dimensions: (ID: 5)
Coordinates:
* ID (ID) int64 0 1 2 3 4
Data variables:
Category (ID) int64 1 1 3 2 3
Latitude (ID) float64 -39.94 -39.75 -40.0 -39.89 -39.38
Longitude (ID) float64 -62.36 -61.74 -62.1 -61.85 -62.03Select the data from our EMIT dataset using the Latitude and Longitude coordinates from our point dataset, then convert the output to a pandas.dataframe.
extracted = ds.sel(latitude=xp.Latitude,longitude=xp.Longitude, method='nearest').to_dataframe()
extracted
| reflectance | latitude | longitude | wavelengths | fwhm | spatial_ref | ||
|---|---|---|---|---|---|---|---|
| ID | bands | ||||||
| 0 | 0 | 0.021049 | -39.940087 | -62.360269 | 381.005585 | 8.415 | 0 |
| 1 | 0.021988 | -39.940087 | -62.360269 | 388.409210 | 8.415 | 0 | |
| 2 | 0.022930 | -39.940087 | -62.360269 | 395.815826 | 8.415 | 0 | |
| 3 | 0.023884 | -39.940087 | -62.360269 | 403.225403 | 8.415 | 0 | |
| 4 | 0.025078 | -39.940087 | -62.360269 | 410.638000 | 8.417 | 0 | |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 4 | 280 | 0.084580 | -39.379961 | -62.030050 | 2463.381592 | 8.803 | 0 |
| 281 | 0.081152 | -39.379961 | -62.030050 | 2470.767822 | 8.804 | 0 | |
| 282 | 0.072722 | -39.379961 | -62.030050 | 2478.153076 | 8.806 | 0 | |
| 283 | 0.061968 | -39.379961 | -62.030050 | 2485.538574 | 8.807 | 0 | |
| 284 | 0.057322 | -39.379961 | -62.030050 | 2492.923828 | 8.809 | 0 |
1425 rows × 6 columns
The output is a longform dataframe using the 'ID' field as an index. This is missing our 'Category' column from our original dataframe. Use the pd.join function to add the 'Category' column to our dataset using 'ID' as an index.
df = extracted.join(points['Category'], on=['ID'])
df
| reflectance | latitude | longitude | wavelengths | fwhm | spatial_ref | Category | ||
|---|---|---|---|---|---|---|---|---|
| ID | bands | |||||||
| 0 | 0 | 0.021049 | -39.940087 | -62.360269 | 381.005585 | 8.415 | 0 | 1 |
| 1 | 0.021988 | -39.940087 | -62.360269 | 388.409210 | 8.415 | 0 | 1 | |
| 2 | 0.022930 | -39.940087 | -62.360269 | 395.815826 | 8.415 | 0 | 1 | |
| 3 | 0.023884 | -39.940087 | -62.360269 | 403.225403 | 8.415 | 0 | 1 | |
| 4 | 0.025078 | -39.940087 | -62.360269 | 410.638000 | 8.417 | 0 | 1 | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4 | 280 | 0.084580 | -39.379961 | -62.030050 | 2463.381592 | 8.803 | 0 | 3 |
| 281 | 0.081152 | -39.379961 | -62.030050 | 2470.767822 | 8.804 | 0 | 3 | |
| 282 | 0.072722 | -39.379961 | -62.030050 | 2478.153076 | 8.806 | 0 | 3 | |
| 283 | 0.061968 | -39.379961 | -62.030050 | 2485.538574 | 8.807 | 0 | 3 | |
| 284 | 0.057322 | -39.379961 | -62.030050 | 2492.923828 | 8.809 | 0 | 3 |
1425 rows × 7 columns
Now we have a dataframe containing our initial data, in addition to the extracted point data. This a a good place to save an output as a .csv. Go ahead and do that below.
df.to_csv('../data/example_out.csv')
We can use our dataframe to plot the reflectance data we extracted, but first, mask the reflectance values of -0.01, which represent deep water vapor absorption regions. To do this, assign values where the reflectance = -0.01 to np.nan.
df.reflectance[df.reflectance == -0.01] = np.nan
C:\Users\ebolch\AppData\Local\Temp\3\ipykernel_11236\3052335053.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df.reflectance[df.reflectance == -0.01] = np.nan
Plot the data using hvplot. We can use by= to separate the reflectances by their latitude and longitude.
df.hvplot(x='wavelengths',y='reflectance', by=['latitude','longitude']).opts(xlabel='Wavelengths (nm)',ylabel='Reflectance')
Email: LPDAAC@usgs.gov
Voice: +1-866-573-3222
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹
Website: https://lpdaac.usgs.gov/
Date last modified: 01-04-2023
¹Work performed under USGS contract G15PD00467 for NASA contract NNG14HH33I.